EFFICIENT TOPOLOGY-CONTROLLED SAMPLING OF 

IMPLICIT SHAPES 

Jason Chang and John W. Fisher III 

\ Massachusetts Institute of Technology 

Deparment of Electrical Engineering and Computer Science 
^ ; 32 Vassar St. Cambridge, MA 02139 



C/2 



cn 

in 
o 



March 4, 2013 



\Q , Abstract 

Sampling from distributions of implicitly denned shapes enables analysis of various energy functionals 
used for image segmentation. Recent work [I] describes a computationally efficient Metropolis-Hastings 
' method for accomplishing this task. Here, we extend that framework so that samples are accepted at 

every iteration of the sampler, achieving an order of magnitude speed up in convergence. Additionally, 
we show how to incorporate topological constraints. 



1 Introduction 



^ . For many Bayesian inference tasks, evaluating marginal event probabilities may be more robust than com- 

' puting point estimates (e.g. the MAP estimate). Image segmentation, particularly when the signal-to- noise 

ratio (SNR) is low, is one such task. However, because the space of shapes is infinitely large, direct inference 
or sampling is often difficult, if not infeasible. In these cases, Markov chain Monte Carlo (MCMC) sampling 
approaches can be used to compute empirical estimates of marginal probabilities. Recently, Chang et al. pQ 
derived an efficient MCMC method for sampling from distributions of implicit shapes (i.e. level sets). We 
improve upon that algorithm in two ways. First, we improve the convergence rate by denning a Gibbs-like 
iteration in which every sample is accepted and, second, we demonstrate how to efficiently incorporate both 
local and global topological constraints on sample shapes. 

Many approaches formulate image segmentation as an energy optimization. One can derive a related 
. Bayesian inference procedure by viewing the energy functional E(£; x) as the log of a probability function 



p Llx (£\x)<xexp[±E(£;x)}, (1) 

where £ is the labeling associated with some segmentation, x is the observed image, and the ± in the exponent 
depends on whether one is maximizing or minimizing. In PDE-based level-set methods, £ = sign(y?), where 
if is the level-set function. In graph-based segmentation algorithms, such as ST-Mincut [2] [3] or Normalized 
Cuts [4], £ is the label assignment. Often, the energy functional decomposes into a data fidelity term and a 
regularization of the segmentation (Normalized Cuts being an exception). Bayesian formulations treat the 
former as the data likelihood and the latter as a prior on segmentations: 

p L \x(e\x)^p xlL (x\£)p L (£). (2) 

While both the data likelihood and prior terms are user-defined, the form of the prior varies considerably 
depending on the optimization method; a curve-length penalty is typically used in level-set methods while a 
neighborhood affinity is typically used in graph-based methods. We refer to these two forms as the PDE-based 
energy and the graph-based energy, respectively. 

Another form of prior information incorporates the topology of the segmented object. Han et al. [5] first 
showed this using topology-preserving level-set methods, where the topology of the segmented object was 
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not allowed to change. This methodology was later extended in [61 to topology-controlled level-set methods, 
where the topology was allowed to change but only within an allowable set of topologies. To our knowledge, 
topology-constrained sampling methods have not previously been considered. 

Current methods for sampling implicit shapes include [7], [8], and pQ which all use Metropolis-Hastings 
MCMC sampling procedures. As shown in pQ, the methods of [7] and [8] converge very slowly and cannot 
accommodate any topological changes. however, presented an algorithm called BFPS that acts directly on 
the level-set function. BFPS generates proposals by sampling a sparse set of delta functions and smoothing 
them with an FIR filter. Delta locations and heights were biased by the gradient of the energy to increase 
the Hastings ratio. Empirical results demonstrated that BFPS was orders of magnitude faster than [7] and 
[8], and representations, unlike the previous two methods, could change topology. Here, we show how to 
efficiently sample from a distribution over segmentations in both the PDE-based and graph-based energies. 
In contrast to previous MCMC samplers, proposals at each iteration are accepted with certainty, achieving 
an order of magnitude speed up in convergence. Additionally, we incorporate topological control so as to 
exploit prior knowledge of the shape topology. 



2 Metropolis-Hastings MCMC Sampling 

We begin with a brief discussion of two MCMC sampling algorithms (c.f. [9, for details). For notational 
convenience, we drop the dependence on x in distributions. We denote Pl\x(£) as the target distribution, i.e. 
the distribution from which we would like to sample. MCMC methods are applicable when one can compute 
a value proportional to the target distribution, but not sample from it directly. Distributions defined over 
the infinite-dimensional space of shapes fall into this category. MCMC methods address this problem by 
constructing a first-order Markov chain such that the stationary distribution of that chain is exactly the 
target distribution. For this condition to hold, it is sufficient to show that the chain is ergodic and satisfies 
detailed balance. The Metropolis-Hastings sampler (MH-MCMC) [9] is one such approach. At time t in 
the chain, the algorithm samples from some user-defined proposal distribution, ~ q(<p^ \^ t ~ l \ %), and 
assigns the transition probability: 

(t) w/ prob min(aW,l) 



a 



(p^ x ) w/ prob 1 — min(a^), 1) 
PL\x(j {t) ) q^-^^Kx) 



(4) 



where is the Hasting's ratio, and the segmentation labels are £^ = sign(if^) and i?( £_1 ) = sign((^^ -1 ^). 
As this transition probability satisfies detailed balance, the correct stationary distribution is ensured as long 
as the chain is ergodic. A single sample from the target distribution is then generated by repeating this 
proposal/ transit ion step until the chain converges. 



2.1 Gibbs Sampling 

Gibbs sampling is a special case of MH-MCMC, where the acceptance ratio, evaluates to one at every 
iteration. Gibbs proposals typically select a random dimension and sample from the target distribution 
conditioned on all other dimensions. Empirical evidence (as shown in Figure [I]) indicates that when learning 
appearance models (e.g. in [ID]), Gibbs sampling exhibits slow convergence times. Block Gibbs sampling, 
where a group of dimensions are sampled conditioned on all other dimensions, allows for larger moves. The 
proposed algorithm is related to this type of Gibbs sampling. 



3 Gibbs-Inspired Metropolis-Hastings 

BFPS [lj achieves a high acceptance ratio at each iteration by biasing the proposal by the gradient of the 
energy functional. Here, we design a similar proposal such that samples are accepted with certainty. Given 
some previous level-set function, Lp^~ l \ we generate a proposal, (p^ with the following steps: (1) sample a 
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Figure 1: An example of the Gibbs sampling getting stuck in a local extrema as compared to the presented 
GIMH-SS. 
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Figure 2: Mapping the masked level-set function (left) onto the real line (right). Possible ranges, r, and 
widths, /3, are shown on the right. 



mask, m, that selects a subset of pixels and (2) add a constant value, /, to all pixels within this mask. We 
can express this as 

<pW(m,f) = <p«-»+f-m, (5) 

where the mask, m, is a set of indicator variables with the same size as cp. The support of the mask can be 
of any shape and size, though in practice we use circles of random center and radius. Deferring the choice of 
/, we write the proposal likelihood as 

q(<p® l^-^j = PM(m)PF\XM*(f\rn, tp^). (6) 

Figure [2] shows a notional mask overlaid on a level-set function, with the height of ip plotted on the real 
line for pixels in the support of the mask. The dotted line marks the zero-height which splits pixels into 
foreground and background. As all pixels in the support of the mask are summed with the same constant, 
/, choosing an / is equivalent to shifting the zero- height by — /. The resulting proposed zero- height can fall 
into one of (|m| ± 1) different ranges, where \m\ counts the pixels in the support of the mask. The random 
range, of the perturbation can take on integers in [— n~, n + ], where ri^ count the pixels in the support of 
the mask that have height ip ^ 0. 

Sampling a perturbation, /, can then be decomposed into sampling a range followed by a value within 
that range. Most energy functionals only depend on the sign of pixels, which conveniently is determined 
solely by the range. If we choose PF\R,(f\ r ) = ~k~ (i- e - a uniform distribution), we can write the perturbation 
likelihood as 

PFR\XM$(f\m, (/? (£-1) ) = -rPR\xM<s>(r\m, (/? ( * _1) ), (7) 

where /3 r is the width or range r (note that r is deterministic conditioned on /). Because the value of / 
within a range r does not affect the sign of the resulting level-set function, the proposed labeling can be 
expressed as 

£W(m,r) =sign(0^(mj ^p F \ R (f\r)). (8) 
By choosing the following Gibbs-like proposal for the ranges, 

P^XM^Am,^-^) oc /3rPL\x(Z {t) (r,m)), (9) 

one can verify via Equations 2J and [7] that the Hastings ratio evaluates to one at every iteration. 

One subtlety exists in the selection of endpoints for the first and last range. Since these two ranges extend 
to ±oo, their corresponding f3 is also oo. We therefore restrict both of these ranges to be of some finite length 
Poo. While any finite value will suffice, in practice, we choose /?oo to be 1. With the endpoint selection for /3oo, 
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Figure 3: Example distributions for where the zero- height line can fall, (a) shows an example of a valid 
proposal, and (b) shows an example of an invalid proposal. 



another complication arises. Consider the illustrative distributions for the zero- height line shown in Figure 
[H In these examples, some constant fioo is chosen, and the current zero-height line is marked with the dotted 
line. We can consider these two distributions as two different mask selections. Figure l3al shows what we call 
a valid mask, in which the current zero- height line falls within the non-zero probability space. Figure [3b| 
however, shows what we call an invalid mask, in which the current zero-height line falls outside the non-zero 
probability space. Here, one could propose a forward perturbation such that the resulting proposed zero- 
height line falls in the non-zero probability space. However, the backward transition to bring the proposed 
zero- height line back to the current zero- height line would be zero because it lies in the zero-probability space. 
We note that if (3^ is chosen to be some constant C, and all pixels are initialized to have height \cpi\ < C, 
then every possible (initial) mask will be valid. As the chain progresses, some masks will become invalid, in 
which case we only sample valid masks. Because any mask that contains both positive and negative pixels is 
valid, there are always valid masks to sample from. 



3.1 Algorithm Description 

The resulting algorithm, called the Gibbs-Inspired Metropolis Hastings Shape Sampler (GIMH-SS), is de- 
scribed by the following steps: (1) Sample a circular mask, m, with a random center and radius; (2) Sample 
a range, r, according to Equation [9j (3) Sample a perturbation, /, uniformly in this range; (4) Compute 
(p^(m,f) using Equation [5] (5) Repeat from Step 1 until convergence. 

When calculating the likelihood of each range in Step 2, only a value proportional to the true likelihood is 
needed. Normalization can be performed after the fact because all ranges are enumerated. Equivalently, the 
energy, which is related to the likelihood via a logarithm, only needs to be calculated up to a constant offset. 
We therefore compute the change in the energy by including or excluding the pixels in each range. This can 
be efficiently computed for all ranges using two passes. On each pass, we begin at the range ro, setting this 
energy to zero and choosing to either go in the positive (first pass) or negative (second pass) direction. We 
then consider r±\ and compute the change in energy when the the one pixel changes sign. In graph-based 
energies, if an n- affinity graph is used, only n computations are required at each range evaluation, one for 
each edge. This procedure is iterated until all ranges are computed (i.e. up through r± n ±). 

Unlike graph-based energy functionals, the curve length penalty used in PDE-based energy functionals 
can be a non-local and fairly expensive computation. In optimization-based segmentation algorithms, the 
evolution of the level-set is based on the gradient of the curve length (i.e. the curvature) which can efficiently 
be calculated locally. If a subpixel-accurate curve length calculation is required, then the energy functional 
will indeed depend on the value of the perturbation, /, within a particular range, r. As an approximation, 
we compute a pixel- accurate curve length by first setting all pixels on the boundary to have height ±0.5, 
initializing a signed-distance function, and then calculating the curve length via 

£dl = jjipi) \Vipi\di n^Sto) |V^| , (10) 
and the following discrete approximation to the S(-) function 



; i (id 




If a first-order accurate signed-distance function is obtained using a fast marching method, the height at 
a particular pixel depends only on pixels in a 3x3 neighborhood. This dependence relationship is illustrated 
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Figure 4: (a) Signed distance function dependence and (b) curve length penalty dependence. 
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Figure 5: Average energy across 100 sample paths for various shape sampling algorithms. Original image 
shown on the left. 



in Figure |4aJ where changing the sign of the center black pixel will affect the height of the signed distance 
function at all the gray pixels. From equation fTQ], the local curve length calculation depends on the magnitude 
of the gradient of the level-set function. If a centered finite-difference is used for the x and y directions, then 
the local curve length computation depends on neighbors above, below, left, and right. Thus, if the heights 
are changed for the gray pixels in Figure HaJ the curve length computation changes for all the gray pixels in 
Figure [4bJ The change in the curve length for changing the sign of the center pixel is consequently a function 
of the sign of the 21 pixels in Figure l4bl We precompute all possible 2 21 (2 million) combinations so that 
the curve length penalty can be efficiently computed by a simple table lookup. 

3.2 Convergence Speeds 

To illustrate the performance of GIMH-SS, we run the sampling algorithm on the image shown in Figure [5] 
We note that this is the same image used in the comparison of [1 . We run each of the four algorithms for 10 4 
seconds. As in pQ, an approximate algorithm is used for [7] and [8] that does not necessarily ensure detailed 
balance. Certain random initializations require topological changes to converge to the correct segmentation, 
which neither [7] and [8] can not accomodate. In these situations, samples are accepted with a certain 
probably instead of instantly being rejected. We note that without this approximation, the algorithms would 
never converge to the correct solution. While BFPS exhibits running times that are orders of magnitude 
faster than [7] and [8], GIMH-SS is almost an order of magnitude faster than BFPS. The average energy 
obtained using GIMH-SS both rises faster and converges faster than BFPS. Additionally, the gradient of 
the energy functional (which potentially can be difficult to evaluate) is never needed in the proposal. These 
results combined with those found in [lj indicate that GIMH-SS draws samples approximately lOx faster 
than BFPS, 10 5 x faster than [8j, and 10 6 x faster than [7]. 

3.3 Relation to Block Gibbs Sampling 

Block Gibbs sampling first selects a mask of pixels (or dimensions) to changes, and then samples from the 
target distribution conditioned on all other pixels. In binary segmentation, a block of size \m\ would require 
one to evaluate an exponential number (2' m l) of different configurations, which is intractable for a reasonably 
sized mask. 

The GIMH-SS algorithm similarly selects a mask of pixels. The level-set function orders the masked 
pixels, in that, if a pixel of height +h changes sign, all pixels of height < cfi < h must as well. Consequently, 
this algorithm samples from a subset of the conditional distribution, resulting in a linear number (\m\ + 1) 
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Figure 6: Examples of topological numbers with (n,n) = (4,8^ 
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Figure 7: Splitting a region vs. destroying a handle while a pixel is added to BG. Topologal numbers are 
shown on right. 



of different configurations. We note that ergodicity is ensured because the ordering of pixels by the level-set 
function changes over time. 



4 Topology-Controlled Sampling 

In this section, we extend GIMH-SS to a topology-controlled sampler. The topology of a continuous, compact 
surface is often described by its genus (i.e. the number of "handles"). Digital topology [llj is the discrete 
counterpart of continuous topology, where regions are represented via binary variables on a grid. 

In digital topology, connectiveness must be defined for the foreground (FG) and background (BG) 
describing how pixels in a local neighborhood are connected. For example, in 2D, a 4-connected region 
corresponds to a pixel being connected to its neighbors above, below, left, and right. An 8-connected region 
corresponds to being connected to the eight pixels in a 3x3 neighborhood. Connectiveness must be jointly 
defined for the foreground (n) and background (n) to avoid topological paradoxes. As shown in [IT], valid 
connectivities for 2D are (n,n) G {(4,8), (8,4)}. Given a pair of connectiveness, the topological numbers [12] 
at a particular pixel, T n (for the FG) and Tjt (for the BG) count the number of connected components a pixel is 
connected to in a 3x3 neighborhood. Figure [6] shows a few neighborhoods with their corresponding topological 
numbers. While these topological numbers reflect topological changes, they do not distinguish splitting or 
merging of regions from the creation or destruction of a handle. Segonne [6] defines two additional topological 
numbers, T+ and T^, which count the number of unique connected components a pixel is connected to over 
the entire image domain. T+ and depend on how pixels are connected outside of the 3x3 region and 
allow one to distinguish all topological changes. 

By labeling each connected component in the foreground and background, Segonne shows that T+ can be 
computed efficiently when a pixel is added to the foreground and when a pixel is added to the background. 
In 3D, one must calculate when adding a pixel to the foreground, which can be computationally expensive. 
We show here that this calculation is not necessary in 2D. 

Consider the two pixels marked by o and A in Figure Removing the o pixel from the foreground 
splits the region and removing the A pixel destroys a handle. We emphasize that in this 2D case, both 
T n (°) + T n ( A ) and T t?(°) + T ^( A )- In 3D > this is generally not the case because for the A pixel, the 
two background regions can actually be connected in another 2D slice of the volume. In fact, in 2D, the 
destruction of a handle in the foreground corresponds directly to a merging of regions in the background. 
Likewise, the splitting of regions in the foreground corresponds directly to a creation of a handle in the 
background. Because of this one-to-one mapping, we do not need to compute the expensive when adding 
a pixel to the foreground. Table [T] summarizes the topological changes of the foreground and background in 
2D as a function of the four topological numbers. 
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Table 1: Topological changes as a function of topological numbers. 'C - Create; 'D' - Destroy; 'S' - Split; 'M' 
- Merge; 'H' - Handle(s); 'R' - Region(s); 'X' - any value; - no topological change. Omitted configurations 
are impossible in 2D. 



In this section, we summarize how to sample from the space of segmentations while enforcing topology con- 
straints (c.f. [?] for details). While the goal is similar to [6], a simple alteration of the level-set velocity in an 
MCMC framework will not preserve detailed balance. A naive approach generates proposals using GIMH-SS 
and rejects samples that violate topology constraints. Such an approach wastes significant computation gen- 
erating samples that are rejected due to their topology. We take a different approach: only generate proposed 
samples from the set of allowable topological changes. Recalling the discussion of ranges in GIMH-SS, one 
can determine which ranges correspond to allowable topologies and which do not. Ranges corresponding 
to restricted topologies have their likelihood set to zero in a topology-controlled version of GIMH-SS (TC- 
GIMH-SS). The methodology presented here treats the topology as a hard constraint. A distribution over 
topologies could be implemented by weighting ranges based on topologies rather than eliminating restricted 
ones. 

Similar to GIMH-SS, TC-GIMH-SS also makes two passes to calculate the likelihoods of the ranges. 
Starting at ro, we either proceed in the positive or negative direction. Because ro corresponds to the range 
that does not change the sign of any pixels, ro will never correspond to a topological change. As the algorithm 
iterates through the possible ranges, a list of pixels that violate a topological constraint is maintained. If this 
violated list is empty after a range is considered, then the range corresponds to an allowable topology. At 
each iteration, while any pixel in the violated list is allowed to change sign, it is removed and all neighboring 
pixels in the violated list are checked again as their topological constraint may have changed. In this process, 
for each range, one must maintain the labels of each connected component. 

5 Results 

While TC-GIMH-SS applies to almost any PDE-based or graph-based energy functional, we make a spe- 
cific choice for demonstration purposes. Similar to [10 , we learn nonparametric probability densities over 
intensities and combine mutual information with a curve length penalty. We impose four different topology 
constraints on the foreground: unconstrained (GIMH-SS), topology-preserving (TP), genus-preserving (GP), 
and connected-component-preserving (CCP). The TP sampler does not allow any topology changes, the GP 
sampler only allows the splitting and merging of regions, and the CCP sampler only allows the creation and 
destruction of handles. Typical samples from each of these constraints are shown in Figure [HI When the 
topology constraint is incorrect for the object (e.g. using TP or GP), the resulting sample may be undesirable 
(e.g. creating an isthmus connecting the two connected components of the background). When the topology 
is correct, however, more robust results can be obtained. For example, the CCP constraint removes some 
noise in the background. 

The usefulness of the topology constraint relies on a valid initialization. Histogram images [7] display 
the marginal probability that a pixel belongs to the foreground or background where lighter colors equate to 
higher foreground likelihood. The computed histogram images are shown in Figure [9] for each of the topology 
constraints. We initialize the samples either using a random circle containing the foreground (FG Init), or 



4.1 Topology-Controlled GIMH-SS 
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Figure 8: Example samples imposing different topology constraints. 
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Random 
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Figure 9: Histogram images using different initializations. 



a random circle placed anywhere in the image (Random Init). While not always true, incorrect topology 
constraints are sometimes mitigated when looking at marginal statistics. For example, in the FG initialization 
case, the isthmus in the TP and GP constraints is no longer visible. Additionally, if the initialization only 
captures one connected component of the background (which is possible with random initialization), certain 
samplers prohibiting splitting regions (TP and CCP) will not be able to capture the entire region. This is 
reflected in the histogram image with the gray center. 

Consider the low SNR image of Figure [101 Chang et al [Tj demonstrated that sampling improves results 
when compared to optimization based methods in low SNR cases. When the problem is less likelihood 
dominated, as in this case, the prior has greater impact. The top row of Figure [10] shows the histogram 
images obtained using each of the topological constraints. One can see remnants of the isthmuses in the 
TP and GP cases. Thresholding the normalized histogram image at a value t reveals the t th quant ile of the 
segmentation. For example, if t = 0.9, the resulting foreground region of the thresholded histogram contains 
pixels that are in the foreground for at least 90% of the samples. We show the 95%, 50%, and 5% quantile 
segmentations in Figure [TUJ Since reducing the threshold never shrinks the foreground segmentation, we can 
overlay these quant iles on top of each other. In the 5% quantile, we can clearly see the isthmuses in the TP 
and GP cases. This result is poor because the wrong topology (i.e. the wrong prior) was used. However, if we 




BHBB 



Quantiles 
I 195% 
OH 50% 

5% 




GIMH-SS 



Figure 10: Low SNR image using FG Init. Histograms are shown above, and quantiles (thresholded his- 
tograms) are shown below. 



8 



Draft Report - Do Not Distribute or Cite 




Original GIMH-SS TP GP CCP 



Figure 11: Example images illustrating the utility of topology priors. 

use the CCP constraint, we are able to improve results as compared to the unconstrained case by removing 
a lot of the background noise. 

The CCP constraint is particularly useful when an unknown number of handles exist (e.g. deformable 
objects). Objects with a known number of handles in 3D projected onto a 2D plane can have any number of 
handles. We show two example images of a human and the resulting thresholded histogram image in Figure 
[TT1 In the first image, the handles formed by the arms are not captured well with TP and GP. In the second 
image, the vignetting causes the unconstrained and the GP to group some background with foreground. 

6 Conclusion 

We have presented a new MCMC sampler for implicit shapes. We have shown how to design a proposal 
such that every proposed sample is accepted. Unlike previous methods, GIMH-SS efficiently samples PDE- 
based and graph-based energy functionals and does not require the evaluation of the gradient of the energy 
functional (which may be hard to compute). Additionally, GIMH-SS was extended to include hard topological 
constraints by only proposing samples that are topologically consistent with some prior. 
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